Descriptive Statistics

Frequency Distributions

  • FREQUENCY DISTRIBUTION
  • RELATIVE FREQUENCY DISTRIBUTION
  • PROPORTION
  • PERCENTAGE
  • CUMULATIVE
  • RATE
  • BAR GRAPH
  • HISTOGRAM
  • LINE GRAPH
  • STATISTICAL MAP

Objectives

  • Calculate proportions and percentages
  • Construct and analyze frequency, percentage, and cumulative distributions

DISTRIBUTION

Shows all the possible values (or intervals) of the data and how often they occur.


FREQUENCY DISTRIBUTION

A table reporting the number of observations falling into each category of the variable.

Table 1. Attitudes about sex before marriage

premarsx

n

always wrong

357

almost always wrong

122

wrong only sometimes

258

not wrong at all

1,378

Total

2,115

Survey question: There’s been a lot of discussion about the way morals and attitudes about sex are changing in this country. If a man and woman have sex relations before marriage, do you think it is _________.

Table 1. Attitudes about sex before marriage

premarsx

n

always wrong

357

almost always wrong

122

wrong only sometimes

258

not wrong at all

1,378

Total

2,115

The number of respondents who answered this survey question.

Table 1. Attitudes about sex before marriage

premarsx

n

always wrong

357

almost always wrong

122

wrong only sometimes

258

not wrong at all

1,378

Total

2,115

The number of respondents who said pre-marital sex was “wrong only sometimes.”

RELATIVE FREQUENCY DISTRIBUTION

A table showing the proportion or percentage for each value of a variable.


Proportions are between 0 and 1.0.

Proportion = count (f) / total number of cases (N).


Percentages are between 0 and 100.

Percentage = proportion × 100.

CUMULATIVE FREQUENCY DISTRIBUTION

The number or percentage of observations at or below a given category.


Table 3. Attitudes about sex before marriage, with cumulative percentages

premarsx

n

%

cumulative %

always wrong

357

17

17

almost always wrong

122

6

23

wrong only sometimes

258

12

35

not wrong at all

1,378

65

100

Total

2,115

100

\({\color{mathGreen} 17} + {\color{mathOrange} 6} = {\color{mathRed} 23\%}\)

RATES

\(\frac{Actual\;occurrences}{possible\;occurrences}\)


Examples:

Nominal variables:
can have frequency distributions, cannot have cumulative frequency distributions


Ordinal:
can have frequency distributions and cumulative frequency distributions


Interval-ratio:
can have frequency distributions, cumulative frequency distributions, and rates

A bar graph is used:
for nominal or ordinal variables,

to show frequencies or percentages,

using separated rectangles, with height proportional
to the frequency or percentage.

A histogram is used:
for interval-ratio variables,

to show frequencies or percentages,

using separated rectangles, with height proportional
to the frequency or percentage.

A line graph is used:
for interval-ratio variables,

to show frequencies or percentages,

joining by category the frequency or average with a line.

A statistical map is used:
for interval-ratio variables,

to show geographical variations, often in ratios,

using variation in color or hue.

Central Tendency

  • MEAN
  • MEDIAN
  • MODE
  • OUTLIER
  • PERCENTILE
  • BIMODAL
  • SYMMETRICAL DISTRIBUTION
  • POSITIVELY SKEWED DISTRIBUTION
  • NEGATIVELY SKEWED DISTRIBUTION

Objectives

  • Explain the importance of measures of central tendency.
  • Calculate and interpret the mean, the median, and the mode.
  • Identify the relative strengths and weaknesses of the three measures.
  • Determine and explain the shape of a distribution.

Summary Statistics


We use summary statistics to find out what is TYPICAL in a distribution.

MEAN

The arithmetic average obtained by adding up all the scores and dividing by the total number of scores.


  • most commonly used measure of central tendency,
  • it’s weakness is that it is sensitive to outliers (extreme scores in a distribution)

Finding the mean in a list: \(7, 4, 2, 8, 0, 9, 5\)

  1. Add all observations together: \(7 + 4 + 2 + 8 + 0 + 9 + 5 = 35\)
  2. Divide the sum by the number of observations: \(\frac{35}{7} = 5\)
Family ID Annual Income (CAD)
F01 $48,000
F02 $52,000
F03 $45,000
F04 $50,000
F05 $53,000
F06 $49,000
F07 $46,000
F08 $51,000
F09 $175,000
F10 $250,000

Most families in this sample earn between $45K–53K, but two high-income households push the average far above what’s typical.

Source: Totally fake data

MEDIAN

The arithmetic average obtained by adding up all the scores and dividing by the total number of scores.

The median is the value at the 50th percentile in a cumulative frequency distribution.


PERCENTILE

A score below which a specific percentage of the distribution falls.

Finding the median in a list with an odd number of observations:

\(7, 2, 1, 3, 4, 1, 5, 9, 2\)

  1. Put the list in order: \(1, 1, 2, 2, 3, 4, 5, 7, 9\)
  2. Pick the center number: \(3\)

Finding the median in a list with an even number of observations:

\(2, 0, 1, 2, 5, 1, 3, 1\)

  1. Put the list in order: \(0, 1, 1, 1, 2, 2, 3, 5\)
  2. Add the two center numbers & divide by 2: \(\frac{1 + 2}{2} = 1.5\)

How often do the demands of your job interfere with your family life?

wkvsfam

n

%

cumulative %

often

218

11

11

sometimes

645

33

44

rarely

669

34

78

never

447

23

101

Total

1,979

101

Source: U.S. General Social Survey 2022

MODE

Category or score with the highest frequency (or percentage) in a distribution.


BIMODEL

Two values or categories with the highest frequency.

Finding the mode in a list:

\(7, 2, 1, 3, 4, 1, 5, 1, 2\)

  1. Put the list in order: \(1, 1, 1, 2, 2, 3, 4, 5, 7\)
  2. Pick the most frequent number: \(1\)

Table 01. Most of the time people…

helpful

n

%

cumulative %

try to be helpful

365

39

39

looking out for themselves

440

47

86

depends

137

15

101

Total

942

101

Source: U.S. General Social Survey 2024

TIP: MOST respondents said others are “looking out for themselves”

TIP: A bimodal distribution has two distinct humps, even if the peaks aren’t exactly the same height.

Choosing a Measure

The MODE is appropriate for nominal and ordinal variables.

It can be identified for interval-ratio level variables, but is often not useful.

The MEDIAN is appropriate for interval-ratio and ordinal variables.
It cannot be used for nominal level variables.

The MEAN can ONLY be determined for interval-ratio variables.

Distribution Shapes

Positively Skewed Distribution

Negatively Skewed Distribution

Variability

  • RANGE
  • INTERQUARTILE RANGE
  • VARIANCE
  • STANDARD DEVIATION

Objectives

  • Explain the importance of measuring variability.
  • Calculate and interpret the range, interquartile range, variance, and standard deviation.
  • Identify the relative strengths and weaknesses of the measures.

Measures of Variability


Describe the diversity in a distribution, for interval-ratio variables.

They reveal how spread out the values in your dataset are.

Range

RANGE

The difference between the highest and lowest values in a distribution.

  • The strength of the range is that it is easy to calculate and simple to understand.
  • The weakness of the range is that it is based only on the lowest and the highest scores, which could be atypical and therefore it may be misleading.

Finding the range in a list: \(26, 23, 28, 27, 24, 25, 32, 25, 28, 25, 25, 26, 27, 26, 27, 25\)

  1. Put the list in order: \({\color{mathBlue} 23}, 24, 25, 25, 25, 25, 25, 26, 26, 26, 27, 27, 27, 28, 28, {\color{mathRed} 32}\)
  2. Subtract the min from the max: \({\color{mathRed} 32} - {\color{mathBlue} 23} = 9\)

Interquartile Range

INTERQUARTILE RANGE (IQR)

The width of the middle 50% of the distribution.

IQR in a list with an odd number of observations:

\(2, 3, 3, 4, 4, 6, 7, 7, 7, 8, 9, 11, 12\)

  1. Q1 is the median of the numbers below the median: \({\color{mathOrange} 2, 3, 3, 4, 4, 6,} {\color{mathBlue} 7}, 7, 7, 8, 9, 11, 12\) (\(\frac{3 + 4}{2}\)) \(= {\color{mathOrange} 3.5}\)
  2. Q3 is the median of the numbers above the median: \(2, 3, 3, 4, 4, 6, {\color{mathBlue} 7}, {\color{mathRed}7, 7, 8, 9, 11, 12}\) (\(\frac{8 + 9}{2}\)) \(= {\color{mathRed} 8.5}\)
  3. Subtract Q1 from Q3: \({\color{mathRed} 8.5} - {\color{mathOrange} 3.5} = 5\)

IQR in a list with an even number of observations:

\(3, 4, 5, 7, 9, 10, 11, 13\)

  1. Q1 is the median of the numbers below the median: \({\color{mathOrange} 3, 4, 5, 7, } 9, 10, 11, 13\) (\(\frac{4 + 5}{2}\)) \(= {\color{mathOrange} 4.5}\)
  2. Q3 is the median of the numbers above the median: \(3, 4, 5, 7, {\color{mathRed}9, 10, 11, 13}\) (\(\frac{10 + 11}{2}\)) \(= {\color{mathRed} 10.5}\)
  3. Subtract Q1 from Q3: \({\color{mathRed} 10.5} - {\color{mathOrange} 4.5} = 6\)

TIP: The median of the list is \(8\).

Standard Deviation

STANDARD DEVIATION

A measure of variation for interval-ratio variables.


Along the way to calculating the standard deviation, you calculate the variance of a distribution.

Standard Deviation Forumulas

Standard Deviation in 5 Steps

  1. Calculate the mean.
  2. Subtract the mean from every value (deviation from the mean).
  3. Square each “deviation from the mean.”
  4. Calculate the mean of the squared “deviations from the mean.”
  5. Take the square root of this new mean!

TIP: The mean of the squared “deviations from the mean” is the variance!

\(2,3,4,7,9\)


1. Calculate the mean \(\bar{X}\)


\(\frac{2 + 3 + 4 + 7 +9}{5} = 5\)

\(2,3,4,7,9\)


2. Subtract the mean (\(\bar{X}\)) from every value (\(X\))

3. Square each difference

4. Calculate the mean of the squares

\(\frac{9 + 4 +1 + 4 +16}{5} = 6.8\)

TIP: 6.8 is known as the variance!

5. Take the square root of the variance

\(\sqrt{6.8} = 2.6\)

I expect the average [whatever you’re studying] to differ by [your standard deviation] from the mean.


Example: Mean: 5; SD: 2.6

I expect the average [number of household family members] to differ by [2.6 people] from the mean [of 5 people per household].

Knowledge Check